7 research outputs found
How degenerate is the parametrization of neural networks with the ReLU activation function?
Neural network training is usually accomplished by solving a non-convex
optimization problem using stochastic gradient descent. Although one optimizes
over the networks parameters, the main loss function generally only depends on
the realization of the neural network, i.e. the function it computes. Studying
the optimization problem over the space of realizations opens up new ways to
understand neural network training. In particular, usual loss functions like
mean squared error and categorical cross entropy are convex on spaces of neural
network realizations, which themselves are non-convex. Approximation
capabilities of neural networks can be used to deal with the latter
non-convexity, which allows us to establish that for sufficiently large
networks local minima of a regularized optimization problem on the realization
space are almost optimal. Note, however, that each realization has many
different, possibly degenerate, parametrizations. In particular, a local
minimum in the parametrization space needs not correspond to a local minimum in
the realization space. To establish such a connection, inverse stability of the
realization map is required, meaning that proximity of realizations must imply
proximity of corresponding parametrizations. We present pathologies which
prevent inverse stability in general, and, for shallow networks, proceed to
establish a restricted space of parametrizations on which we have inverse
stability w.r.t. to a Sobolev norm. Furthermore, we show that by optimizing
over such restricted sets, it is still possible to learn any function which can
be learned by optimization over unrestricted sets.Comment: Accepted at NeurIPS 201
DNN Expression Rate Analysis of High-Dimensional PDEs: Application to Option Pricing
We analyze approximation rates by deep ReLU networks of a class of multivariate solutions of Kolmogorov equations which arise in option pricing. Key technical devices are deep ReLU architectures capable of efficiently approximating tensor products. Combining this with results concerning the approximation of well-behaved (i.e., fulfilling some smoothness properties) univariate functions, this provides insights into rates of deep ReLU approximation of multivariate functions with tensor structures. We apply this in particular to the model problem given by the price of a European maximum option on a basket of d assets within the Black-Scholes model for European maximum option pricing. We prove that the solution to the d-variate option pricing problem can be approximated up to an epsilon-error by a deep ReLU network with depth O(ln(d) ln(epsilon(-1)) + ln(d)(2)) and O(d(2+1/n) epsilon(-1/n)) nonzero weights, where n is an element of N is arbitrary (with the constant implied in O(center dot) depending on n). The techniques developed in the constructive proof are of independent interest in the analysis of the expressive power of deep neural networks for solution manifolds of PDEs in high dimension.ISSN:0176-4276ISSN:1432-094
Deep Neural Network Approximation Theory
This paper develops fundamental limits of deep neural network learning by characterizing what is possible if no constraints are imposed on the learning algorithm and on the amount of training data. Concretely, we consider Kolmogorov-optimal approximation through deep neural networks with the guiding theme being a relation between the complexity of the function (class) to be approximated and the complexity of the approximating network in terms of connectivity and memory requirements for storing the network topology and the associated quantized weights. The theory we develop establishes that deep networks are Kolmogorov-optimal approximants for markedly different function classes, such as unit balls in Besov spaces and modulation spaces. In addition, deep networks provide exponential approximation accuracy—i.e., the approximation error decays exponentially in the number of nonzero weights in the network—of the multiplication operation, polynomials, sinusoidal functions, and certain smooth functions. Moreover, this holds true even for one-dimensional oscillatory textures and the Weierstrass function—a fractal function, neither of which has previously known methods achieving exponential approximation accuracy. We also show that in the approximation of sufficiently smooth functions finite-width deep networks require strictly smaller connectivity than finite-depth wide networks.ISSN:0018-9448ISSN:1557-965